Evaluating several unsupervised class-selection methods
نویسنده
چکیده
In knowledge discovery from collected databases, one of the firstly arising questions is ”what should be discovered”. Two lines of work can be followed. In the first line, unsupervised learning is performed, usually clustering data, followed by a characterization of the discovered knowledge. In the second line, classifiers are constructed for each highly important feature registered in the database. In the latter approach, the selection of the important features is done by using domain background knowledge provided by human experts in the domain where the data was collected from. Yet, in actual domains, high number of features make difficult to select the important features for which the classifiers should be constructed. In this study, several measures for ranking class-candidate features are proposed and preliminary evaluated on a domain.
منابع مشابه
Efficient Partial Order Preserving Unsupervised Feature Selection on Networks
In the past decade, research on network data has attracted much attention and many interesting phenomena have been discovered. Such data are often characterized by high dimensionality but how to select meaningful and more succinct features for network data received relatively less attention. In this paper, we investigate unsupervised feature selection problem on networks. To effectively incorpo...
متن کاملA Nonlinear Mixture Model based Unsupervised Variable Selection in Genomics and Proteomics
Typical scenarios occurring in genomics and proteomics involve small number of samples and large number of variables. Thus, variable selection is necessary for creating disease prediction models robust to overfitting. We propose an unsupervised variable selection method based on sparseness constrained decomposition of a sample. Decomposition is based on nonlinear mixture model comprised of test...
متن کاملConstraint Score: A new filter method for feature selection with pairwise constraints
Feature selection is an important preprocessing step in mining high-dimensional data. Generally, supervised feature selection methods with supervision information are superior to unsupervised ones without supervision information. In the literature, nearly all existing supervised feature selection methods use class labels as supervision information. In this paper, we propose to use another form ...
متن کاملDiscriminative Clustering by Regularized Information Maximization
Is there a principled way to learn a probabilistic discriminative classifier from an unlabeled data set? We present a framework that simultaneously clusters the data and trains a discriminative classifier. We call it Regularized Information Maximization (RIM). RIM optimizes an intuitive information-theoretic objective function which balances class separation, class balance and classifier comple...
متن کاملEvaluating the Effectiveness of Supervised and Unsupervised Classification Methods in Monitoring Regs (Case Study: Jazmourian Reg)
Due to its mobility and ability to move and its direct impact on residential areas and various developmental activities, the Ergs are of major importance in the desert areas, so monitoring of those is very important. Considering that the use of supervised and unguarded methods is considered as one of the most common methods in determining and monitoring land uses, in this research, the accuracy...
متن کامل